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Abstract: In this paper, a novel approach to the problem of estimating the heavy-tail 
exponent a > of a distribution is proposed. It is based on the fact that block-maxima 
of size m of the independent and identically distributed data scale at a rate of m^/". 
This scaling rate can be captured well by the max-spectrum plot of the data that leads 
to regression based estimators. Consistency and asymptotic normality of these estimators 
is established under mild conditions on the behavior of the tail of the distribution. The 
results are obtained by establishing bounds on the rate of convergence of moment -type 
functionals of heavy-tailed maxima. Such bounds often yield exact rates of convergence 
and are of independent interest. Practical issues on the automatic selection of tuning 
parameters for the estimators and corresponding confidence intervals are also addressed. 
Extensive numerical simulations show that the proposed method proves competitive for 
both small and large sample sizes and for a large range of tail exponents. The method is 
shown to be more robust than the classical Hill plot and is illustrated on two data sets of 
insurance claims and natural gas field sizes. 
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1. Introduction 

Heavy-tailed distributions arise in many diverse scientific areas: insurance claims, high-speed 
network traffic, hydrology, the topological structure of the World Wide Web and of social 
networks, linguistics, just to name a few (see e.g. Adler et al. (1998), McNeil (1997), Resnick 
(19976), Faloutsos et al. (1999), Adamic and Huberman (2000, 2002), Zipf (1932, 1949), Tso- 

nis et al. (1997)). Highly optimized physical systems also exhibit heavy-tailed behavior, as 
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discussed in Carlson and Doyle (1999). 

A real valued random variable X with cumulative distribution function (c.d.f.) F{x) = 
P{X < x}, a; € M is said to have (right) heavy tail if, 

¥{X > x] = l- F{x) = L(x)x~", as x ^ oo (1.1) 

for some a > 0, where L{x) > is a slowly varying function. The tail exponent q > 
controls the rate of decay of F and hence characterizes its tail behavior. The problem of 
estimating the tail exponent has attracted a lot of attention in the literature since it poses 
numerous theoretical, as well as, practical challenges (de Haan et al. (2000) and de Sousa 
and Michailidis (2004)). Most approaches focus on the scaling behavior of the largest order 
statistics X{\]N) > X{2]N) > ■ ■ ■ > X{N;N) obtained from an independent and identically 
distributed (i.i.d.) sample -^(1), ■ ■ ■ ,X(N) from F. Typical examples include Hill's estimator 
(1975), its numerous variations (Kratz and Resnick (1996), Resnick and Starica (1997)), and 
the kernel-based estimators of Csorgo et al. (1985) (see also Feuerverger and Hall (1999)). For 
example, the Hill estimator, which is one of the most widely used estimators in practice, can 
be written as 

(iH{k)= (^-Y,iil^X{i;N)-lnX{i + l;N))y =: (^E^^' ' ^^'^^ 

i=l i=l 

where Yi := i(\nX{i; N) — lnX{i + 1;A^)). As shown in Weissman (1978), assumption 1)1. 1() 
implies that for all fixed A:'s, the vector {1^}^^]^ converges in distribution to a vector of inde- 
pendent exponentially distributed variables with mean 1/a. Therefore, when both N and k 
are large, the statistic 3// (A;) in (|1.2() behaves like the sample mean of a sample of independent 
exponential variables. This suggests that the estimator 3// (A;) is consistent (Mason (1982)), 
and under some additional conditions on the tail behavior of F, asymptotically normal (Hall 
(1982)). In practice, one relies on plotting anik) as a function of the order statistics k (Hill 
plot) and then selecting an appropriate value for k (see example in Figure ^). In the case of the 
Pareto distribution {F{x) = 1 — (x/(To)~°, x > ao, ctq > 0), the Hill estimator is also a condi- 
tional maximum likelihood estimator. However, when deviations from this ideal case occur, it 
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exhibits substantial bias and the resulting plot can be misleading (see examples and discussion 
in de Haan et al. (2000) and de Sousa and Michailidis (2004) and references therein). These 
shortcomings were addressed in a series of papers that introduced modifications of the original 
Hill estimator and the resulting Hill plot. The kernel-type estimators introduced by Csorgo 
et al. (1985) extend the Hill estimator, by introducing non-uniform weights in (|1.2|) (see also 
Groeneboom et al. (2003)). Namely, given a non-negative and non-increasing kernel function 
K(x), X > 0, one considers 



for some A > 0. The Hill estimator can be recovered as a special choice of the function K. 
Observe also that the threshold parameter k in (|1.2() is no longer present. The choice of the 
kernel function and the bandwidth parameter A > 0, however, remain an important and difficult 
problem for the kernel estimators, similar to the choice of k for the Hill estimator. One practical 
disadvantage of kernel-type estimators is that no analogue of the Hill plot exists. Therefore, 
one cannot readily judge how reliable the resulting numerical estimates are. 

Other important and popular estimators include the Pickands estimator (see, Pickands 
(1975) and Dekkers and de Haan (1989)) and de Haan's moment type estimator (see Dekkers 
et al. (1989)). Resnick and Starica (1997) introduced a modified and smoothed version of the 
Hill plot and showed that it performs better in practice when the data depart from the Pareto 
model (see also de Haan et al. (2000)). The consistency of estimators based on this alternative 
Hill plot is also established for dependent data (see, Resnick and Starica (1995)). 

In this study, we propose a novel method for estimating the tail index a. It relies on the 
concept of max self- similarity. We focus on the case when the slowly varying function in 
1)1.1(1 is asymptotically constant and consider block-wise maxima of i.i.d. random variables 
X{1),X{2), . . . with c.d.f. F. Block-maxima of block sizes m, scale at a rate of m^/", as 
m oo. Therefore, we can obtain an estimate of a, by focusing on a sequence of growing, 
dyadic block sizes m = 2^ , 1 < j < log2 N, j G N, and estimating the mean of logarithms of 
block-maxima (log-block-maxima). This is achieved by examining the max-spectrum plot of 




(1.3) 
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the data, defined as means of log-block-maxima as a function of the logarithm of the block- 
size. The slope of the max-spectrum plot for large block-sizes yields an estimate of 1/a (see 
Figure n below). 

When the X(i)'s come from a Prechet distribution, then their block-maxima have the same 
Frechet distribution, rescaled by m^/", where m denotes the block size. Thus, in practice, the 
max-spectrum plot is essentially linear (Figure [21). One can view i.i.d. Frechet sequences as 
max self-similar with self-similarity parameter 1/a fPefinition 12. Due to this exact max 
self-similarity property, our estimation framework works best for Frechet data. On the other 
hand, the Hill-type estimators work best for Pareto data. This also shows the fundamental 
difference between the two approaches. In many important applications the Hill plot is rather 
volatile. The max spectrum turns out to be more robust to outliers in the data or to deviations 
from its corresponding ideal Frechet model than the Hill plot. In Section f5. 31 we examine two 
data sets: (i) 2, 167 insurance claims due to fire losses in Denmark and (ii) volumes of natural 
gas reserves in 406 Oil rich provinces. In both cases, the max self-similarity estimators yield 
values consistent with previous detailed studies of these data sets (see McNeil (1997) and de 
Sousa and Michailidis (2004), respectively). These values depart from values that one obtains 
directly from the Hill plots. In fact, in case (ii)^ due to the peculiar discrete nature of the data 
set the Hill plot has a saw tooth shape and it is particularly hard to interpret, whereas the 
max spectrum plot appears to yield a reliable estimate. 

The remainder of the paper is structured as follows. In Section [2 we introduce the max- 
spectrum plot and the self-similarity estimators of the heavy-tail exponent a and establish 
their basic properties in the ideal Prechet setting. Some useful results on rates for moment-type 
functionals of heavy-tailed maxima are presented in Section |21 These results are used to prove 
the consistency and asymptotic normality of the max self-similarity estimators in Sectional In 
Section [21 the performance of the new estimators is examined through a simulation study. The 
max self-similarity estimators are then shown to work well in the context of two challenging 
real data examples where the classical Hill plot is rather volatile and is hard to interpret. 
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In this section, we introduce some notation and recall some basic definitions used in the re- 
mainder of the paper. We then introduce estimators of the heavy-tail exponents based on max 
self-similarity and discuss their basic properties in the ideal Frechet case. 

2.1. Definition and basic properties 

We focus on the case where the slowly varying function L in is trivial, that is, when 

P{X > x} = 1 ~ o-(^x-", asx^oo, (2.1) 

with (To > and where ~ means that the ratio of the left-hand side (l.h.s.) to the right-hand 
side (r.h.s.) in (|2.H) tends to 1, as x — > oo. For simplicity, we further assume that the X(i)'s 
are almost surely positive (-F(O) = 0). We address the general case where the X(z)'s can take 
negative values in Section 0] (see, Proposition 14. 3() . 

We begin with some useful definitions: for an i.i.d. sample X{i), i € N := {1, 2, . . .} from 
consider the sequence of block-maxima 

m 

Xm{k) := max X{m{k - 1) + i) = \J X{m{k - l) + i), k = l,2,..., 

l<i<m ' 
i=l 

with m G N, where Xm.{k) is the greatest observation in the A;— th block. The Fisher-Tippett- 
Gnedenko Theorem (see e.g. Proposition 0.3 in Resnick (1987)) then implies that, as m — > oo, 
m~^^"Xm{k) converges in distribution to a random variable Z with an a— Frechet distribution. 
More precisely, 

F{Z <x} = exp{-o-(^x-"}, X > 0, (2.2) 
where ctq > 0, called the scale coefficient of Z, is as in 1)2. In fact, as m — > oo, we have 

\-^X„,{k)} ^iz{k)} , (2.3) 

where the Z(fe)'s are independent copies of Z and where ^ denotes convergence of the finite- 
dimensional distributions. Thus, for large values of m, the normalized block-maxima behave 
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like a sequence of i.i.d. a— Frechet variables. In fact, when the X(A:)'s are a— Frechet, 1)2. 3() 
holds with equality for all m G N (see Relation (|7.c}|) in the Appendix). The sequence of i.i.d. 
a— Frechet X(A:)'s is thus max self-similar in the sense of the following definition. 

Definition 2.1 A sequence of random variables X = {X[k)~\k^f,] (defined on the same prob- 
ability space) is said to be max self-similar with self-similarity parameter > 0, if for any 

m > 0, m G N, 

{ V X{m{k - 1) + .)} ^ {^"^i^)], (2-4) 

1=1 

where denotes equality of the finite-dimensional distributions. 

If the X{kys are i.i.d. but not Frechet, then Relation (|2.3() indicates that 1)2. 4(1 holds asymp- 
totically, as m ^ oo, with H = 1/a. Thus, any sequence of i.i.d. heavy-tailed variables can 
be regarded as asymptotically max self-similar with self-similarity parameter H = 1/a. This 
feature suggests that an estimator of H and therefore a can be obtained by focusing on the 
scaling of the block-maxima of growing block sizes. Crovella and Taqqu (1999) used a simi- 
lar idea based on the scaling of block-wise sums to estimate a heavy-tail exponent a when 
Q G (0,2). 

Given an i.i.d. sample -^(1), • • • , X{N) from we consider 

D{j,k):= maxX{2^{k-l) + i) = \/X{2^{k-l) + i), k = 1,2, ... ,Nj, (2.5) 

— *— i=l 
for all j = 1, 2, ... , [log2 N], where Nj := [A^/2-^] and [x] denotes the largest integer not greater 
than X G M. By analogy to the discrete wavelet transform, we refer to the parameter j as the 
scale and to k as the location parameter. We consider dyadic block-sizes for algorithmic and 
computational convenience (for more details, see Stoev et al. (2006)). 

Observe that for any fixed j, the block-maxima D{j,k) are independent in k since they 
involve maxima over non-overlapping blocks of the X(i)'s. Moreover, as argued above, when 
the X{iys follow an a— Frechet distribution, 

{Dij, k)}keN = {2-'/«z)(0, k)}kefi = {2^/"X(fc)}fc6M, (2.6) 
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for any scale j G N. Introduce the statistics 

Nj 

Yj:= ^Y.^og^ ^{j,k), i = l,2,...,[log2(iV)] (2.7) 

^ k=l 

and observe that by the Law of Large Numbers, the l^ 's are consistent and unbiased estimators 
of the expectations E log2 D(j, 1), provided that these are finite, f Corollarv l3 . ll below estabhshes 
that E| log2 1)1 are finite under general conditions on the c.d.f. F{x).) In view of the 
asymptotic max self-similarity l|2.Hj) of X, relationship (|2.6j) holds approximately for large 
scales j, and in fact, 

KYj = E log2 D{j, 1) ~ j/a + C, (2.8) 

with C = C{ao,a) = Elog2 0"oZ, where Z is an a— Frechet variable with unit coefficient as in 
H2.2|) above. Here ~ means that the difference between the l.h.s. and the r.h.s. tends to zero. 

In practice, one can look at the max-spectrum plot of the statistics Yj's versus j (see Figure 
n below). In view of (|2.8j) it is expected that for large j's the slope coefficient of a linear fit 
of the Yj^s against j's would yield an estimate oi H = 1/a. Further, observe that the log- 
linear scaling relation in 1)2. 8() becomes more precise, the larger the scale j (block-size 2^) and 
holds exactly for all scales j = 1, . . . , [log2(A^)], when the X{k)^s come from an a— Frechet 
distribution (see 1)2. 6|l '). 

Thus, given a range of scales 1 < ji < j < j2 ^ \^og2{N)], we define the following regression- 
based estimators oi H = 1/a and a 

Hu,{ji,j2) ■='^WjYj, and S^(ji, ^2) := l/-ffu,(ii,j2), (2.9) 

j=ji 

where the weights wj are chosen so that 

^ = and ^ jwj = 1. (2.10) 

i=ii i=ii 

It is easy to see that the linear estimators H^j in H2.9() with weights as in 1)2. 10(1 are least squares 
estimators in a linear regression model. In the rest of the paper, the estimators and in 
(|2.9)1 are referred to as max self-similarity estimators. 
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Remark ( Computational complexity ) 

The proposed estimators exhibit a significant computational advantage over Hill-type or 
kernel-based estimators. Given a sample of size N one can compute the max-spectrum Yj, 1 < 
j < [log2-^], with Yj as in H2.7() by using 0{N) operations since 0{N/2^) pair-wise maxima 
and sums are computed, for j = 1, . . . , [loga iV], and therefore o(Y!j=i^\N/'^^]) = 0{N) 



operations are done. On the other hand, methods involving order statistics require sorting the 
sample which results in 0{N \og2{N)) operations. 

We now illustrate the nature of the max-spectrum plot and the resulting estimator using 
an example of Internet topology data. The data describe the degree of connectivity between 
autonomous systems (AS - networks under a single administrative authority) on the Internet 
for the year 2002 and is provided by the National Laboratory for Applied Network Research. 
The information has been used to characterize the topology of the Internet (see, e.g. Faloutsos 
et al. (1999) and Chen et al. (2002)). The size of the data set is 13,579 and each observation 
gives the number of connections of an AS to peer AS. The histogram of the data (in log-scale) 
shows that the vast majority of the AS are connected to very few peer systems, but there are 
a few AS that are directly connected to over 10% of their peer systems. The max-spectrum 
indicates a value for the tail index of about 1.5. The Hill estimator for = 80 (where the Hill 
plot seems to stabilize) suggests a value of 1.43. 

2.2. The ideal Frechet case 

We start by assuming that X{1), . . . , X{N) is an i.i.d. sample of a— Prechet variables with 
scale coefhcient do > and study the behavior of -ff«,(ji,i2) in this setting. 

Consider the regression problem 



Yj=j/a + C + ej, ji <j <j2 




where 



C = C{ao,a) = Elog2(aoZ) = logs (ctq) + E logs (Z) 



(2.12) 
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Fig 1. Left panel: histogram (log-scale) of AS connectivities. Right panel: max-spectrum plot for the 
AS connectivity data. The large vertical lines indicate the range of j 's where a linear fit was used to 
estimate the heavy-tail index a. The shorter vertical lines are 95% confidence intervals for the KYj 's. The 
reciprocal of the slope yields an estimate o/5!^(3, 13) = 1.4957. This range was selected automatically 
with tunning level p = 0.1, discussed in Section \5.}A 

for an a— Prechet Z random variable with unit scale coefficient, and where 1 < ji < j2 ^ 
[log2iV]. In view of H2.6|) . we have that the errors ej have zero means. They are, however, 
dependent in j due to the corresponding dependence of the Yj statistics in ()2.7() . Moreover, 
the number of D{j, A;)'s at a scale j in (|2.7() is Nj = and therefore, the variances of the 

ej's grow exponentially in j. This implies that the minimal variance unbiased estimators of the 
parameters of interest 6 = {H, C)* that are linear in Yj are obtained through generalized least 
squares (GLS). They are given by 

Oj: = = (^*S-i^)-i^*S-iy, (2.13) 

where A = (ab) with a* = (ji,...,j2) and 6* = (1,...,1), and S = (Cov(yi, Y,))^^^^.^^.^ is 
the covariance matrix of the vector Y = {Y^}^?^^^. An explicit expression of the matrix S = 
Sa(ji,i2;iV) is given next. 

Proposition 2.1 Let Y = {Yjyj^^j_^ be as in (j2.7jl . where the underlying distribution of the 
X{k) 's is a—Frechet with scale coefficient ao > 0. Then, for all ji < i < j < j2, 

EYj=j/a + C{ao,a), 
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and 

Cov{Yi,Yj) = ^a{jij2;N),, = - Jl), = [N/2% (2.14) 

where 

V^(a) := Cov(log2(Zi),log2(Zi V (2'^ - 1)^2)), a > 0, (2.15) 
and where Z\ and Z2 are independent 1 — Frechet variables with unit scale coefficients. 

Proof: Let ji < i < j < j2 and observe that A^^ = 2i-''Nj + R, where < R< 2^-\ G N. 
In view of (|T7)) . 

Cow{YuYj) = ^Y,Y.Cov{log,D{i,h),log2D{j,k2)) 

* ^ fci=l k2=l 

= Cov(log2D(i,(/ci - l)2^-^ + ^),log2D(j,A;2)) 

* i fci=i e=i k2=i 

. R 

+mrJ2Yl Cow{\og2D{i,N,2=-^ +l),\og2D{j,k2)), (2.16) 

* £=1 fe2 = l 

where the last relation follows from expressing the sum Yl!kl=i ^ ^ double sum X]fci'=i Y^t=i 
plus the remainder term Xl^Li Sfc2'=i' Observe that in view of ()2.5() . we have that the terms 
Cov(log2 D{i, {ki — 1)2-'"* + £), log2 D{j, ^2)), I < £ < 2-'"* are non-zero only if ki = k2 since 
otherwise the terms D{i, {ki — 1)2-'^* +€) and log2 -D(j, ^2) involve maxima of non-overlapping 
sets of X(fc)'s. Note moreover that 

D{j, k2) = D{i, {k2 - 1)2^-' + 1) V • • • V D{i, k22^-'), (2.17) 

where the D{i, A;)'s are i.i.d. a— Frechet variables with scale coefficient 2*/"cro (see 1)7. 3() below). 
Therefore, for all fc = 1, . . . , iV,- and £ = 1, . . . , 2^-\ 

{D{i, {k - 1)2^-' + i),D{j, k)) = (2^/"Z', 2^/"Z' V (2^/" - 2*/")Z"), 

where Z' and Z" are independent a— Prechet variables with scale coefficients ctq > 0. Observe 
that Z' = aoZl^'^, where Zi is 1—Frechet with unit scale coefficient. Hence, for all ki = k2 = 
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1,. . . ,Nj and 1=1,..., 2^~\ we have 

Cov(log2Z)(i,(A:i - l)2^-' + i),log,Dij,k2)) 

= Cov(log2(2*/VoZ|/"),log2(2*/VoZ|/" V (2^/" - 2^/")aoZ2)) 

= Cov(log2(z|/"),log2(z|/" V (2(^-)/" - 1)^2^/°)) = -il). (2.18) 

The last two relations follow from the facts that log2(2*/"cro.^|''") equals log2(2*/"cJo) + 
a~^log2(^i) and since Cov(^ + a, r/ + 6) = Cov(^,r7), for any constants a and b and random 
variables ^ and rj with finite variance. 

Note that the covariances in the remainder term in ()2.16() vanish since D{i, Nj2^~'^ + ^), ^ = 
1, . . . , 2-'"* are independent of X{i), i = 1, . . . , Nj2K Thus, by using Relation H2.18() . we obtain 

Remarks 

1. Observe that the covariance matrix S does not depend on the scale coefficient do, which 
is due to the fact that the 1^ 's are obtained through a logarithmic transformation of the 

x{kys. 

2. Observe that for ah 1 < ji < 32 < [log2 N] and a > 0, we have by (|2.14p that 

Sa(ii,j2;iV) = ^Si(ii,j2;iV), 

where T,i{ji, j2; N) corresponds to the covariance matrix of y = {ijl^L^-^ from a 
1— Frechet sample. 

That is, the unknown parameter a appears only in the factor l/a^ of the covariance 
matrix and thus the GLS estimators Hj^ and Cs do not depend on a. Indeed, if one 
multiplies S by a factor (j), the resulting estimates are not affected, since the formula 
(|2.13|) involves the product of (p and its inverse. 

This invariance property shows that the GLS estimators can be computed exactly, with- 
out using plug-in approximations for the unknown parameter a involved in the matrix 
S. Table mi in the Appendix contains values of for i = 0, 1 . . . , 19, obtained through 
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Monte Carlo simulations. This is sufficient to handle sample sizes of up to 2^'^ = 1, 048, 576 
observations. 

3. Finally, ^^(ji, j2; N) is invertible, which follows from the fact that the joint distribution 
of the l^ 's has a density with respect to the Lebesgue measure. 

In view of the above remarks, we have that 

Corollary 2.1 The minimum variance unbiased estimators for H and C in the regression 
model ()2.11|) , linear in Yj , are given by ()2.13|) . Moreover, the covariance matrix of Oy. is 

where T,i{ji, j2; N) is the covariance matrix of the Yj statistics based on 1—Frechet data. 

Max self-similarity H = 0.67069(0.0029822), a =1 .491 

, , , , , , , , 1 

is- 
le- 

14- 




Q\ , , , , , , , , 1 

2 4 6 8 10 12 14 16 18 

Scales j 

Fig 2. Displayed is an example the max-spectrum of an i.i.d. a— Frechet sample of size N = 2^^ = 
131,072 with a = 1.5. Observe that the max-spectrum is perfectly linear in j. The vertical intervals 
around every Yj point indicate 95% confidence intervals for the mean ofYj based on normal approxima- 
tion. Observe that these confidence intervals grow with the scale j. GLS regression based on all scales 
1 < 7 < 17 was used to obtain an estimate a = 1.491. The estimated standard deviation of the slope 
H = 0.67 is indicated in parentheses: oh = 0.00298. This last estimate is based on the asymptotic 
variance of H (see Proposition \4.'0jj . 

In Figure [21 the max-spectrum of a sample from a Frechet distribution with N = 2^^ 
observations is shown. As expected, the max-spectrum is essentially linear in j and the slope 
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yields a very good estimate of 1/a. The asymptotic properties of estimators based on the max- 
spectrum of general heavy-tailed samples are established in Section |31 In practice, when the 
sample is not Frechet, the max-spectrum is linear in j only on a range of the largest scales j. 
The problem of choosing the "best" range of scales to estimate a is very important in practice 
and is briefly addressed in Section E21 

3. Rates for moment type functionals of heavy tailed maxima 

In this section, we establish some results for moment -type functionals obtained from maxima of 
heavy-tailed data. They prove useful in establishing the consistency and asymptotic normality 
of the max self-similar estimators under general conditions, but are also of independent interest 
since they yield exact rates of convergence in many cases. 

Let X{1),X{2), . . . , be i.i.d. random variables with c.d.f. 



Here, we let the function (t{x) take values in the extended half-line (0,oo], that is, (j{x) can 
take the value oo, in which case F{x) becomes e~°° = (see the Examples below). Such a 
representation always exists if the c.d.f. F belongs to the normal domain of attraction of an 
a— Prechet distribution, that is, if 



where G{x) := ¥{Z < x} = exp{— ctq x~"}, x > 0, for some ctq > 0. For simplicity, we suppose 
that the X(i)'s are positive, almost surely, that is F{0) = 0. The case when the X(i)'s can 
take negative values is addressed in Section 0] below. 

Our goal here is to establish bounds on the rate of convergence of E/(M„) to E/(Z), as 
n — > oo, for an absolutely continuous function / : (O,cxo) — > M. We do so under general 
conditions on the asymptotic tail behavior of the c.d.f. F{x). 



F{x) = exp{-CT°(x)x"°}, X > 0, 



(3.1) 



where a > 0, and where the function a{x) > is such that 



iTo > 0, as X — > oo. 




(3.2) 
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In what follows, the next two conditions on the c.d.f. F{x) are needed: 



Condition 3.1 For some /3 > and Ci > 



a'^{x)-a'^\ < Cix 



for all sufficiently large x > 0. 



(3.3) 



and 



Condition 3.2 We have F{0) = and for some C2 > 0, 



(t'^(3;) > C2 min{l, x'*'}, x > 0, for some 7^(0, a). 



(3.4) 



In the examples below, we show that the Conditions 13.11 and 13.21 hold in many cases of 
practical interest. The second condition concerns the behavior of F{x) for small x, and ensures 
that E(XPl|x<ij) < 00, for any p G M. This condition always holds, for example, if the X(i)'s 
are bounded away from zero, almost surely. The case of arbitrary X(i)'s which can possibly 
take negative values is addressed in Section |^ 

The following result provides an upper bound on |E/(M„) — E/(Z)| under the above con- 
ditions for general class of absolutely continuous functions /. Namely, we shall suppose that 
/(x) = /(xo)+ f^^ f'{u)du, X > 0, for some (any) xq € (0, 00), with /' being a locally integrable 
function. 

Theorem 3.1 Let /(x),x > be an absolutely continuous function on all compact intervals 
[a, b] C (0, 00). Let also i^„(x) := P{M„ < x} and G(x) = F{Z < x}, x G R, be the c.d.f 's of 
the random variables Mn and Z in ()3.2() . Suppose that Conditions \3.1\ and \3.^ hold. 
(a) If for some m G R and 6 > 0, 



x'"|/(x)|+esssup^/"'|/'(7/)| ^ 0, X i 0, and x-"|/(x)|W+'' esssupy-"|/'(2/)| ^ 0, x ^ 00, 



0<y<x 



y>x 



(3.5) 



thenE\f{Z)\ andE\f{Mn)\, n G N are finite. Moreover, 




(3.6) 
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Here esssup denotes the essential supremum of a measurable function g, that is, 

esssupy^A9{y) ■= , , inf ^ , sup g{y), 

AoCA, \A\Ao\=0 y<zAo 

for any Borel set A, where \A\ denotes the Lebesgue measure of the set A. 

(b) If in addition to x-^''+^^\f'{x)\dx < oo, then for any e{n) 0, such that 

n^^"e{n) oo, as n ^ oo, we have 



l-OO 



+2 / e-^'--"'-'"\nx)\dx, (3.7) 







for all sufficiently large n, where c G (0,crQ) can be chosen arbitrarily close to Cq . Moreover, 

|E/(M„)-E/(Z)| <C/n-^/", (3.8) 
for all sufficiently large n with some C/ > 0. 

Proof: We first prove part (a). Let f{x) = /(xq) + /^^ f'{u)du, x > 0, with xq e (0,c3o), 
where f'{x), x G (0, oo) is locally integrable, and where = — Jj^ ■ Let now [a,b] C 
(0,c«), Xf) G (a, 6) be an arbitrary interval and observe that f{x)dFn{x) equals 

/ f{x)dFn{x) + / f{x)d{Fn{x) - 1) = F„(xo)/(xo) " F„(a)/(a) - / Fn{x)f'{x)dx 

J a J xo J a 

+ {Fn{b) - l)f{b) - (Fnixo) - l)/(xo) - /Vn(x) " l)f'ix)dx (3.9) 

= {Fn{b) - l)/(6) - Fn{a)f{a) + /(xq) 

- r Fn{x)f'{x)dx+ [ {l-Fn{x))f'{x)dx. (3.10) 
J a J Xq 

The equality in Relation 1)3. 9|) follows from Lemma l7. II 

In view of Relation 1)3. lUp . the monotone convergence theorem implies that E|/(M„)| = 
/o°° \f{x)\dFn{x) is finite if 

|(F„(6)-l)/(6)| + |F„(a)/(a)| ^0, as a j and 6 ^ oo, (3.11) 

and if 

fxo roo 

I Fn{x)\f'{x)\dx+ (l-F„(x))|/'(x)|(ix<oo. (3.12) 

Jxo 

imsart ver. 2006/03/07 file: max-spectrum-l.tex date: February 2, 2008 



Stoev et al. /Estimating heavy-tail exponents through max self-similarity 16 
Observe that by (|3.1() . 

Fn{x) = = exp{-a"(ni/"x)x-"}, x > 0. 

Hence, in view of we have 

1 — ~ iTq as X ^ cxD, (3.13) 

since 1 — e~" ~ as u | 0. Thus, the second convergence in H3.5() . imphes — 
0, h ^ CO. On the other hand, by (|3.4|) . for n > 1, n G N, 

(7"(ni/"2;) > Czn^/^x^ > Cax^, for all x G (0, n-^/"), (3.14) 

and hence 

F„(x) = exp{-a°(ni/"x)2;-"} < exp{-C2X~("-^)}, for all xE(0,n-^/"). (3.15) 

Thus, since vPe~'^ — > 0, as n ^ c«, for any p S R, the first convergence in 1)3. 5() implies that 
Fn{a)f{a) — > 0, as a ^ oo. We have thus shown that 1)3. 11() holds. One can similarly show that 
the integrals in ()3.12() are finite by the using the conditions in p.5() on /' and Relations 1)3. 13(1 
and (|3.14p . Indeed, for almost all x > 0, we have 

Fn(.x)\f'{x)\ < (supo<,<,F„(y)y— )(esssupo<,<,y'"|/'(y)|) = 0(x-l'"l exp{-C2X-("-^)}) ^ 0, 

(3.16) 

as x I and, for almost all x > 0, 

(1 - F„(x))|/'(x)| < (sup(l - F„(y))y-)(esssup,>,y"|/'(y)|) = 0{x-('+'^), (3.17) 

y>x 

as x — > OO. We have thus shown that \f{x)\dFn{x) < oo for all 7i G N. One can similarly 
show that Jp°° \ f{x)\dG{x) < oo, by replacing F„(x) with G(x), above, and using the fact that 
G{x) = exp{— fjQ X > satisfies trivially Conditions 13.11 and 13.21 
Observe that 1)3. 6|) follows from the relations 

/ f{x)dFn{x)= f{xo)- Fn{x)f'{x)dx+ (1 - F„(x))/'(x)dx 

JQ Jo Jxo 
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and 

l-XQ l-OO 

f{x)dG{x) = fixo) - / Gix)fix)dx + (1 - G{x))f{x)dx. 

JxQ 

We now turn to proving part (b). Let e(n) | be such that n^/°e(n) oo, as n — > cxd. By 
using the triangle inequahty, we get 

|E/(M„)-E/(Z)| < / G{x)\f{x)\dx+ Fn{x)\f'{x)\dx 

Jo Jo 

f oo 



poo 

+ / \Fn{x)-G{x)\\f\x)\dx=:h+l2 + l3. 

Je(n) 



le{n) 

We first consider the integral I^. Since n^/"e(n) oo, n oo, in view of ()3.3() . for all 
sufficiently large n, we have 

\Fn{x)-G{x)\ = |a"(ni/"x)-a(^|x-"e-^"(^)^"" 

< Cin-'3/"x-("+^)e-^^"", (3.18) 

for all X G (e(n),oo), where c is an arbitrary constant in (0, cJq), and where 9n{x) is between 
and (7q . Indeed, the first relation in ()3.18() follows by the mean value theorem applied 
to the function g[u) = exp{— u > 0. The inequality in 1)3. 18() . follows from ()3.3() since 
n^/°e(?i) oo implies sup3,>g(-„) a°^{n^/°'x) > c, c G (0, o"o), for all sufficiently large n. 
Therefore p.l8() implies 

/3<C^in-^/" / x-("+'5)e-"^""|/'(x)|(ix < Cin-'^/" / x'^'^+^^lf ix)\e-''''~" dx, 

Je{n) Jo 

for all sufficiently large n. The last integral is finite. Indeed, by assumption 
x~^°''^f^^\f'{x)\dx < oo. The integral x~^°''^^^\f'{x)\e~^^ " dx is finite since in view of 

(esssupo<y<^2/™|/'(2/)|)x-("+^+l"^l)e-'^^"" = O^x-^^+^+'^'^e"'^^"") = 0(xP), x j 0, (3.19) 
for any p > 0. 

We now consider the integral I2. Observe that e(n) > eventually, and hence 

h< exp{-C2X-("-^)}|/'(x)|(ix+ / Fn{x)\f'{x)\dx, (3.20) 

Jo Jn-^/°' 
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by (|XT3|) . Relation implies that cr"(n^/"x) > C2, for all X G (n ^/",e(7i)), and hence 

Fn{x) < exp{— C2X~"} < exp{— C22;~("~'>')}, x E (n~^/", e(n)). Therefore, the second integral 
in (13.20(1 can be bounded above by exp{— C2X~'^"~'^^}|/'(x)|(ix and hence 

h< / exp{-C2X-("-^)}|/'(2;)|(ix. 

JO 

One can similarly bomid Ii. Indeed, Relation (|3.4|) implies that Uq > C2, since <t"(x) ~ 
cJq, X — > 00. For all < X < e(n) < 1 and 7 G (0,a), we have x~" > x~^'^~'^\ and hence we 
obtain 

/■e(n) 

/i = / exp{-ao"x-"}|/'(x)|dx < / exp{-C2X-("-^)}|/'(x)|dx. 
Jo Jo 

The last three bounds for Ii, I2 and 13 imply (|3.7|1 . 

iVow, to prove ()3.8|) . observe that, as in (|3.19() . since a — 7 > 0, for almost all x > 0, we have 

exp{-C2X-("-^)}|/'(x)| < o(x-l™le-^2^"*""^') = 0(xP), x j 0, (3.21) 

for any p > 0. Thus, the second integral in ()3.7() is of order 0{e{n)'P), for any p > and by 
setting e(n) := for some 6 E (0, 1/a), we obtain that l|3.8|) holds. This completes the proof 
of the theorem. □ 

In the following examples we show that most heavy-tailed distributions of practical interest 
satisfy the conditions of Theorem 13. 11 

Examples: 

• (Pareto laws) Let F{x) = 1 — (x/cjo)~", x > (Jq, and F{x) = 0, x < ctq, for some ctq > 
and a > 0. Then, Relation ((3.11) holds with 

C7°(x) = ool(o,<xo](2;) -x''ln(l - (x/o-o)~")l(^(,,oo)(a;), 

that is, the function o"(x) equals 00 for all x S (0, do] to account for the fact that 
F(x) =0, XG (0,cJo]. 

Observe that (t°^{x) satisfies Condition VJ.1\ with [3 = a. Indeed, since ln(l — u) = —u + 
v? /2 + 0{v?)^ n — > 0, by setting u := (x/cjo)"", we obtain 

\r.^(^\ ^"1 ln(l - (x/do)-") ln(l - -U) , ^ „ -a /o99^ 

|(T (xj - o-Q I = h (Tq = o-q ^^^^ h 1 < o-qU = o-Q X , (3.22) 
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for all sufficiently large x. 
One has, moreover, that 

C7"(x) - ~ as X ^ oo. (3.23) 

(see Proposition 13.11 below). 

Condition l^.jjl also holds. Indeed, (t{x) = oo > x^, for all x G (0,(To] and 7 G (0,a). 
To prove 1)3. 4(1 . it remains to show that (T"(a;) > C2 > 0, for all x > 0. As shown in 
(|3.22|) above cj"(x) iTq , x ^ 00, where uq > 0. On the other hand (7"(x) is a positive, 
continuous function over all compact intervals of ((7o,oo) and 0"(x) ^ 00, as x ^ ctq. 
This shows that (t"(x) is bounded below by a positive constant. 

• (Products of Frechet laws) Let F(x) = Gq,j (x/(To)Ga^ (x/cji), where cjo, cti > and 
< ao < CKi; and where Gq(x) = exp{— x~"}, x > denotes the c.d.f. of a standard 
a— Frechet variable. Observe that the function F{x) is the c.d.f. of max{ ctq j ci-^iji 
where Zq and Zi are independent standard oq— and qi— Frechet random variables, re- 
spectively. Therefore, 1)3. 1|) holds with a = oq and 

(j"(x) = (T^ + < x-("i-"o), X > 0. (3.24) 

Conditions 13. II and 13.21 are readily satisfied where P = ai — oq > 0. 

• (Mixtures of Pareto laws) Let 

F{x) =p{l - (x/ao)-"«)l{x>ao} + (1 - - (2;M)-"i)l{.>.,}, < ao < ai, 
where p G (0, 1) and ao, cJi > 0. 

Then, (|3.1() holds with a = oq, and (7"(x) = ool(o,(7,] (a^) — 2;" ln(F(x))l(o-^^oo)(3^)) where 
cr* := min{(To,cri} > 0. 

As in the case of Pareto laws, one can show that Condition 13.11 holds with P = 
min{aO) «i — 00} and, ctq replaced by pa^. In fact, 

a°(x) ~ Cox"^, as x ^ 00, (3.25) 
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where 



fT^i(l-p) , ifQi-Qo<ao 

^^i(l_p)+p2^2ao/2 , ifai-Qo = ao 
pV2"72 , ifQi-Qo>ao 

One can also show that Condition 13.21 holds as in the case of Pareto laws. 
• Absolute values of a— stable (0 < a < 2) and t— distributed random variables Xj's, for 
example, also satisfy Condition 13.11 They do not satisfy Condition 13. 2( however, since 
E(|Xi|~'^l||x|<i}) is infinite. In Proposition 14. 31 below, we address the general case where 
Condition 13.21 fails and in fact the case where the Xj's can take negative values. 

The following result shows that the rate n~^/" in ()3.8() is optimal, if so is the inequality in 
(ESI). 

Proposition 3.1 Assume that F is as in 1)3.11) and satisfies Conditions Vj.l\ and \3.i^ ahove. and 
let f be as in Theorem VJ.lX (b). Suppose, in addition, that cr°(x) — o"q ~ Cix~^, as x ^ oo, 
for some Ci ^ 0. Then 

/•oo 

n-^/"(E/(M„,) - E/(Z)) Ci / x-(°+^V(a;)e-"o^""dx, as n ^ oo. (3.26) 



JO 

Proof: Let as in Theorem 13.11 e(n) ^ be such that n^/"e(n) oo, as n ^ oo. The 
triangle inequality applied to Relation (|3.6|1 implies 

/■oo f^{n) ['^{ri) 

EfiMn)-Ef{Z)- {G{x)-Fn{x))f{x)dx < G{x)\f'{x)\dx+ Fn{x)\f' {x)\dx. 

Je{n) Jo Jo 

(3.27) 

As in the proof of Theorem 13.11 one can show that the integrals in the right-hand side of the 
last expression are of order o{n~^''^), as n — > oo, if e(n) := n"*^, 6 G (0, 1/a) (see (jSSD)- 

To establish (|3.26)) we will now examine the order of the integral in the left-hand side of 
(|T77|I . Observe that 

' '^1^ ' 

as n — > oo, for all x > 0. Hence (as in Theorem 13. 1|) . in view of 1)3. 1|1 and 1)3.28)) . the mean 
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value theorem implies 

n^/"(G(x) - Fn{x))f{x) C7i2;-("+^)/'(x)e-^o^"", 

as n ^ C30, for any x S {^{n), oo) and hence for any x > (e(n) — > 0, n — > oo). As in the proof 
of Theorem 13. 11 one can show that the left-hand side of the last expression is bounded above 
in absolute value by an integrable function. Therefore, the dominated convergence theorem 
implies that n^/°' {G{x) — Fn{x))f'{x)dx converges to the integral in H3.26() . as n — > oo. □ 



The next result, which follows directly from Theorem 13. H is used in Section |^ 

Corollary 3.1 Assume that F is as in and satisfies Conditions \3. 1\ and \cl. i1 above. Then 

E| ln(M„)|f' < oo for a// n G N and p > 0. Moreover, for any p > and k G N, we have 

E|ln(M„)|P-E|ln(Z)|P = 0(n-^/°) and Eln(M„)'= - Eln(Z)*^ = ©(n"^/"), 

as n —> oo, where M„ and Z are as in Theorem VJ. 1{ 

In Section |1J one encounters covariance functionals of maxima over blocks of heavy-tailed 
variables, that is, bivariate moment-type functionals arise. The following result establishes 
rates of convergence for such functionals in the special case of logarithms. 

Corollary 3.2 Suppose that F is as in and satisfies Conditions and VJ.IA Let 

X(l), . . . , X(n) and . . . , y(m), n, m G N 6e i.i.d. random variables with c.d.f. F{x). 

Consider the normalized maxima 

l<i<n l<i<m 

Then, for any a > 0, as n, m ^ oo, we have that 

Eln(M^) ln(M^ V aM^) - Eln(Zx) ln(Zx V aZy) = Oin'^l'^ + m"^/"), (3.29) 
where Zx and Zy are independent a—Frechet random variables with scale coefficients (Tq. 
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Corollary 13.21 was stated in generality which allows us to have different number of X(i)'s 
and y(z)'s {n and m, respectively) in the maxima and M^. This flexibility is needed for 
the proof of Proposition 14. II below. 

Proof of Corollary 13.11 Let f{x) = |ln(x)|P, p > 0, x > 0. Observe that f{x) = 
fi f'{u)du, where f'{x) = p\ ln(x)|*'~^/x for x > 1 and f'{x) = —p\ ln(x)|''~^/x, for < x < 1. 
One can verify that the conditions in ()3.5|) are fulfilled and therefore, Theorem 13.11 implies the 
result. The argument in the case when /(x) = (In(x))'', G N is similar. □ 

Proof of Corollary 13.21 By Corollary 13.11 the expected values in (|3.29|) exist since 
E| ln{M^)\P < oo, Vp > and since a V b < a + b for any a, b > 0. Observe that by in- 
dependence and Fubini's theorem. 



where f{x,y) = ln(x)ln(x V ay), x,y,a > 0, Fn{x) := F(n^/"x)" is the c.d.f. of (and 
M^), and where G{x) = exp{— ctq x~"}, x > 0. Now, by adding and subtracting the term 
{f^ f{x,y)dG{x))dFm{y), applying Fubini's theorem and then the triangle inequality, we 
obtain that the left-hand side of 1)3. 29() is bounded above in absolute value by 



Focus next on the term Ii. Let g{y) := f{x,y){dG{x) — dFn{x)),y > 0. Observe that for 
each y > 0, y ^ x/a, f{x,y) is differentiable in x since 




and 






In fact 



|/^(x,y)| <2|ln(x)|/x + |ln(ay)|/x, x > 0, y > 0. 
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Thus, Theorem 13. II (h). apphed to the inner integral g{y) in Ii imphes 

<n-^/"(C' + C"|ln(y)|), (3.30) 

for all sufficiently large n, where the constants C" > and C" > do not depend on y (This 
follows from Relation 1)3 .71) by taking e(n) := n~^, 5 G (0, 1/a) and observing that the second 
integral therein is negligible with respect to the term (1 + | ln(y)|)n~^/°.) 

Note now that the function | ln(y)| satisfies the assumptions of Theorem 13.11 (b) and hence 
\\n{y)\dFm{y) — > /q°° | ln(y)|dG(y), as m ^ oo. Therefore, the inequality (|3.3U|) implies 
that Ii = 0{n~^/°^), as n — > oo. One can similarly show that I2 = ©(m"^/"), m — > 00. □ 

4. Asymptotic properties of the max self similarity estimators 

We establish here the consistency and asymptotic normality of the estimators defined in 1)2. 9|) . 
above. In fact, we prove joint asymptotic normality of the max self-similarity estimators of the 
tail exponent a and the scale coefficient do. These results rely on the behavior of moment-type 
functionals of heavy-tailed maxima established in Section |21 

The general case where the X(i)'s may be or even take negative values is addressed at the 
end of this section. 

Let the Yj^s be defined as in (|2.7() . where now denotes the sample size of available X(i)'s, 
1 < J [log2 s-nd where Nj := As noted above, the larger the scales j, the 

more precise the asymptotic relation ()2.8() . Therefore, to obtain consistent estimates for the 
parameter H = 1/a one should focus on a range of scales which grows as the sample size 
increases. We therefore fix a range ji < j < 72, Ji, j2 G N and focus on the vectors 

Yr := {Yj+r}f=j^, 

with r G N, j2 + r < [log2 N] where the parameter r = r{N) grows with the sample size. 

The following result shows that the mean and the covariance matrix of the vector Yj. are 
asymptotically equivalent to the mean and and the covariance matrix in the case where the 
A(i)'s are a— Frechet (see Proposition 12. Ij) . 
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Proposition 4.1 Suppose that the c.d.f. F has the representation p.lf) and satisfies Condi- 
tions \3. l\ and VJ.HA above. 
Then, 

m'j+r-^^r{j) =0(1/2^^/°), asr^oo, (4.1) 
and for any fixed ji < i < j < J2, hj G N, we have 

Nj^+rCoY{Yi+r,Yj+r)-a~^^i{i,j) = 0(^1/2''^/°) +0(^27iv), as r^ 00. (4.2) 

Here 

Mj) ■=ij + r)/a + C{ao,a) and J^iii, j) = 2^-^^il;{\i - (4.3) 
where the function tjj is defined in (|2.15|) and where C{aQ,a) is as in (|2.12|) . 

Proof: Observe that by ^H^, we have WYj+r = Elog2(£>(j + r, 1)) = Elog2 ( VS' ^ih 
Therefore, 

KY.+r - U + r)/a - Elog2(aoZ) = Elog2 {^^j^ V ^(^)) " Elog2(aoZ) 



1=1 



Elog2(A/4)-Elog2(aoZ), 



(4.4) 



where M„ := n X{i) and where n := 2^^~^^\ Coronarv 13.11 impUes that the right- 

hand side of (ji31) is of order ©(n"^/") = 0(2-(j+^)^/") = C'(2"''^/"), as r ^ cx3, which in 
turn imphes (|4.1j) . 

14^e now /ocus on proving ()4.2() . Let i < j and recall that A^j+r = [A^/ 2-' "'"''], and A'j+r = 
[iV/2^+'~]. We also have that 

D{j + r,k)= y D{i + r, 2^-\k - 1) + i), for ah /c = 1, . . . , Afj+^. (4.5) 



r-=l 



Note that 2^ ^Nj^r ^ ^i+r and therefore as in the proof of Proposition 12. II above, we get 
CoY{Yj+r,Yi+r) = 7^ V V Cov(log2 D{j + r, /ci), log2 D{i + r, ^2)) 



AT, 



^ E y;Cov(log2l)(j+r,A;i),log2D(i + r,2^-^(/ci-l)+. 
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The second sum in the last expression involves only terms D{i + r, 2^~^{ki — 1) + i), for i = 
1, . . . , 2-'"* since in view of (|4.5|) . the independence of the D{i + r, A;)'s implies that Cov{D(j + 
r, ki),D{i + r, k2)) = 0, for all k2 outside the range 2^~'{ki -!) + £, i = l,..., 2^-\ 

Now, by using the stationarity of the D{i + r, A;)'s and Relation ()4.5() again, we obtain from 
the last relation that 

Cov{Yj+r,Y,+r) = |^Cov(log2( V ^(^ + ^'^))'log2^(^ + r,i; 



2^- 



i+r 



Cov(log2 (m; V (2^-* - l)V"AC),log2(M;)), (4.6) 



where n := 2'+"' and m := (2^-* - l)n with := n-^l'^D{i + r, 1) = n-^" \/^'^^ and 

2^"* n+m 

M;;, :=m-i/" V L>(i + r,£) =m-i/" \/ X{1) = M'^. 

1=2 e=n+l 

Observe that the normalized maxima and Mj^ are independent since they involve maxima 
of disjoint sets of X(r)'s. Thus, by combining the results of Corollaries 13. II and 13.21 we obtain 
that 

Cov(log2(M4v(2^-^-l)i/°A/;:;,),log2(M;,)) -a-2^(|i-j1) =0(l/2^'3/a^ ^ ^ ^47) 

where ijj is as in ^J^. Now, note that Ni+r = 2^^-'Nj2+r + q, where q < 2^'^-\ g G N. This 
follows from the facts that Ni+r = [iV/2*+'''], i = ji, . . . , j2 and i < j2- Thus 

^^2±L _ = 0{1/Nr) = 0{2'-/N). (4.8) 

Ni+r 

Now, by applying Relations (|4.7|) and (|4.8j) . to (|4.()j) . we obtain (|4.2j) . This completes the proof 
of the proposition. □ 

The following theorem is the main result of the section. It establishes the uniform convergence 
of the vector 1^ to a normal vector and provides bounds on its rate of convergence. The 
asymptotic normality of the estimators defined in 1)2. 13() is then an immediate consequence of 
this result (see Corollarv 14. II below) . 
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Theorem 4.1 Suppose that the c.d.f. F has the representation ^i.l^ and satisfies Conditions 
\3.1\ and \'J.!iA above. Let 9 = {^jj^Lj^ ^ M'"\{0}, m = j2 — Ji + 1 he an arbitrary fixed, non-zero 
vector and consider the linear combination (^,1^) := X^jLj^ ^j^j+r- 
Then, 



sup 



¥{,/N~^r{{e,Yr) - {9, fir)) < x} - <l>{x/ae)\ < Ce(l/2^'^/" +r2^/V^), (4-9) 



where ^ stands for the standard Normal c.d.f. and where Ce > does not depend on N. 
Here Nj = [A^/2''] denotes the number of coefficients D{j,k) available on scale j, {9, fir) ■= 
Ylf=n^jl^r{j) and 

h 

al = a-\9,T.^9) := a'^ ^ 9,T.r{i,j)9j > 0. (4.10) 

Proof: Since Ni = [iV/2*], i = 1, . . . , [logg A^], for all j = ji,...,j2, and r G N, r < 
[logsiV] -j2, we have Nj+r = 2^^-iNj^+r + Qj, where < qj < 2^2"^, qj e N. Thus, for ah 
j = ji, ■ ■ ■ ,J2, 

^^•+^ = 1^ E E D{j + r, 2^--^{k -l)+i)+ log2 D{j + r, 2i^-^N,,+r + i) 

— yMk) + Rj, (4.11) 



^j2+r 



where y,+r{k) := iVj,+,iVri J^t" ^og^ D{j + r, 2^-2-^(A; - 1) + 



j+r 

Therefore, 



^-) = 1^ E Uk) + iO,R), (4.12) 



where ^^(A;) := {9,yr{k)), k = 1, . . . ,Nj^+r, with y^(/c) = {yj+r{k)}f^j^ and ii = {-RjjjLj^. 

Observe that the random vectors yr{k), k = 1, . . . , Nj^^r sue i.i.d. and independent from the 
remainder term {9, R). Indeed, this follows from the fact that the X(i)'s are i.i.d. and because 
for any j = ji, . . . , j2, the random variable yj^r{k) depends only on the X{i)^s with indices 
2J2+r(/. - 1) + 1 < i < 2J2+'-/c, k = I,.. .,Nj^+r, and Rj depends on the X(i)'s with indices 
2J2+^7Vj2+^ + 1 < i < iV. 
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Thus, to prove H4.9|) . we proceed in two steps. First, we apply the Central Limit Theorem to 
the first term on the right-hand side (r.h.s.) of (|4.12() . Then, we will argue that the remainder 
term therein can be neglected. 

Step 1. Note that the (,r{kys are i.i.d. but their distributions depend on and hence the 
ordinary C.L.T. does not apply. The Berry-Esseen bound, however, (see e.g. Theorem V.2.4 
in Petrov (1995)) implies that 



sup 

where 



a: 



^{x) denotes the standard Normal c.d.f., and where A > is an absolute constant. This is so, 
provided that the variance a^^ := Var(,^r(l)) and the third moment E|^,.(l)p of the £,r{kys are 
finite. 

Observe first that, by ()4.12|) and by the independence of the ^r{k)^s from R, 

al = Nj,+r (Var(0, Yr) - Var(0, R)) = + 0(l/2^^/°) + 0{2'/N), (4.14) 



where ag is as in (|4.10() . Indeed, this follows from ProDosition l4. ll above. provided that Var(0, R) 
is negligible. In view of 1)4. 11() . however, since < gj < 2^ < 2^'^, j = ji, ■ ■ ■ , j2, 

Yar{9,R) < V Var(log2 £'(i + r, 1)) 



2 019 



Var(log2(2-(^+^)/-D(i + r, 1))), (4.15) 



where m = j2 — ji + 1. In the last relation, we used the inequality Var(r/i + • • • + ?],„) < 
m^(Var(r/i) + • • • + Var(7ym)), m G N and the fact that 

Var(log2 D{j + r, 1)) = Var(log2(2-(^+^)/-Z)(j + r, 1))). 

In view of (|2.5() . however, by Corollarv 13.11 below, the variances on the r.h.s. of (|4.15|) are 

bounded, as r — > oo. This implies that Var(^,i?) = 0{2'^/N), which completes the proof of 
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We now focus on bounding the term E|^r(l) — E,^r(l)|^ in (|4.13p . The inequahty 



^ -XiT < ?n°^^^"^^ ^ \xi\P, m G N, vahd for all p, e M, z = 1, . . . , m, (4.16) 



r 

=1 i=l 



implies 

32 



j=ji 

< m^Yl 1^1'^ E log2 ^(J + r, i) - Elog2 + r, 1) 



^ ^ E ^1 ^°g2 ^(i + r, i) - E log2 + r, 1) 

j=ii «=i 

2 



\Oj\'n log2 ^(i + r, 1) - Elog2 + r, (4.17) 



= m 

3=31 

where m = j2 — ji + 1 and where the last bound follows from the Jensen's inequality. As in 
(|4.15j) above, we have that log2 D{j + r, 1) — Elog2 D{j + r, 1) equals 

log2(2-(^'+^-)/"I)(j + r, 1)) - Elog2(2-(^-+'-)/-I)(i + r, 1)), 

Therefore, by using inequality (|4.1H|) . we get that the r.h.s. of (|4.15j) is bounded above by 

Am^ l^jf (e| log2(2-(j'+'-)/"D(j + r, + (E| log2(2^(j'+^)/"L>(j + r, l)\f\ . 

j=ji 

The last term is bounded, as r ^ oo, in view of (|2.5() and Corollarv 13.11 

We have thus far shown that (|4.13p holds with the r.h.s. being of order 0{l/\/Nr), uniformly 
in r, that is, 

sup 



Q7V,r(x) - $(x) <Ce/^/I%■ = 0(2'^/yy/Nj. (4.18) 
We will now use this fact to prove 1)4. 9p . 

Step 2. By ^J7^, the probability in equals 

EQ7V,r (x/cTg, - y%^r{{0, R) + EC,(1) - {9, ^r))^,) =: EQ^,, (x/cTg, - A^,,) . (4.19) 
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Indeed, this follows from the independence of the ^r.(A;)'s and the remainder term R. 

Now, by applying the triangle inequality, we obtain that the l.h.s. of (|4.9|1 is bounded above 
by: 



supE 

a;eIR 



QN,rix/cr^^ - AN,r) " ^{x/cF^,. - An^v) + sup E ^{x/a^^ - A^^r) " ^{x/cT^^ 



+ sup|$(a;/cJ5J - <^{x/ae)\ =: A1+A2 + A3. (4.20) 



In view of (|4.18p . we have that 

Al < sup|Q^,^(x) = o(2''/'^/Vn), (4.21) 

as ^ 00 and 00. 

Now, focus on the term A2 in (|4.2flj) . By using the mean value theorem, for any a < b, a,b € 
M, we have that \<^{a) - <^{b)\ < \a - 6|/\/27r. Therefore (see ^J^), 

A2 < ^E|A^,,,| < y!^±^fE\{e,R)\+E\Cr{l) - iO,fir)\)- (4.22) 
V2vr v2vr(j^^ V / 

As argued above, in view of ()4.1ip . we obtain by the triangle inequality, that 
m,R)\ < ^f^E|log2l)(i + r,l)| 

^ E ^1 log2(2^(^+^)/"D(j + r, 1))| + const = 0(r/iV,). (4.23) 

The last relation follows by adding and subtracting the term (j + r)/a, and by applying 
Corollary O to the terms E| log2(2-(^+^)/"L>(j + r, 1))|. 

By ()4.1H) . E^f.(l) = E(^, Yr) — E{0, R) and thus by applying the triangle inequality, Propo- 
sition and Relation 1)4. 23(1 . to the second term in the r.h.s. of ()4.22p . we obtain 

A2 < const y/N^(r/Nr + 1/2''^/") = o(r2''^^/N^ + o(l/2'''^/''^ . (4.24) 

Here, we also used the fact that a^^ ^ ce, o-g > 0, as r — > c« (see (|4.14() above). 

Consider now the term A^ in 1)4. 2U() . As above, by using the mean value theorem, we obtain 

^3 < const I l/cTg — 1/ fjc I = const ^— — 

= 0(l/2'"'3/") + 0(2V7V), (4.25) 
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as r — > oo and oo, where the last inequality follows form Relation H4.14[) above and 

the fact that aj - (t|^ = {ag - cr^^){ag + a^^). 

Now, by combining the bounds in Relations (|On)) . (|OT|) . and we obtain 

This completes the proof of the theorem. □ 

Let now the scales ji < j2 be fixed and let r = r{N) G N, r + j2 < [loggA^]. Theorem 
14.11 shows that one can obtain consistent and asymptotically normal estimators of H and 
C = C(o"o,a), as in the ideal Frechet case (j2.13|) . Indeed, let A = {a b) be as in (|2.13j) and 
define = (i^Si, C'si) as in (|2.1,'-{|) and a~^Si being the asymptotic covariance matrix in 
Proposition 14.11 

By using (|2.13|) . one can show that 

j2 32 

H := = Yl ^ ■= = Yl " ^^^i, (4.26) 

j=ji j=ji 

where the Wj^s and the vj's are fixed weights such that 

32 32 32 32 

3=31 3=31 3=31 3=31 

The following result establishes the asymptotic normality of these estimators. 

Proposition 4.2 Assume the conditions of Theorem \4.1\ hold. If r = r{N) G N is such that 
r2^' /N + 1/2*"^/" ^0, as N ^ oo, then for the estimators defined in 1)4. 26(1 . we have 



ym~^{H - H) ^J\f{0,H^c^) and ^ Nj^+r/r{C - C) ^ J\f{0, H^c^), (4.28) 

as N ^ oo, where Cw = ^fj=j-^ 'WiWjT,i{i, j) and where C = C((To,a) is as in (|2.12|1 . 
Moreover, 

lim iV,2+r.Var(#) = lim r"^Af,2+rVar(C7) = H'^c^. 

Af— >oo N^oo 

Proof: The first convergence in ()4.28p follows directly from Theorem 14.11 bv setting 6j := 
Wj, j = ji, . . . ,j2- Indeed, since /ir(j) = (i + r)/a + C, Relation 1)4. 27() implies that 

32 

{9, fir) = J2 + 0/« + C) = 1/a = H. 

3=31 
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Thus, for H = {9,Yr) = YJj=j^ WjYj+r, by Relation we obtain that 

sup \n^/N~^r{H -H)<x}- 0, 
as — > oo. This imphes the asymptotic normahty of H in (|4.28|) . where in view of 1)4.10(1 
al = H\w, Siu;) = Ytj=h ^i^j^ij)- 

We now focus on the estimator C. By setting 6j ■= Vj, j = ji, ■ ■ ■ , j2, we get by using (|4.27() 
that 

{e, fir) = J2 ((•?■ + ^)/« + <^)^i = + 

j=ji 

On the other hand, in view of (|4.26() . 

32 

{e,Yr) = Y,VjY,+r = C + rH 

j=jl 

and thus 

d-C = {9,Yr)- ie,fir)-r{H-H). (4.29) 

We have already shown that the term {H — H) above is asymptotically normal and by Theorem 
14. II the term {6, Yr) — {0, fir) in (|4.29|) is also asymptotically normal. Since r = r{N) — > oo, the 
second term in the r.h.s. of (|4.29|) dominates in the limit. This implies that second convergence 
in Km . 

To complete the proof, observe that by Proposition 14.11 A'j-2+rVar(//) = H'^Cw, as 

— > CO. We now consider the variance of C — C in (|4.29() . and apply the inequality 
Var(^) - 2(Var(e)Var(r/))^/2 ^ y^^^.^^) < y^^^^^ - v) < Var(e) + 2(Var(e)Var(?7))^/2 ^ y^^^^^) 

with ^ := {9,Yr) — {6, fir) and rf := r{H — H). Since Var(?7) dominates Var(^), in the limit, we 
obtain that r~^A'j-2+rVar(C) = H"^ 

Corollary 4.1 Assume the conditions of Theorem \4 1\ hold. Define the estimators 

a:=l/H and a, := 2^-^'"'-^^ , 

where Z is a 1—Frechet random variable with unit scale coefficient. Then with r = r{N) as in 
Proposition \4.^ we have 

^N~^.{a - a) ^ AA(0, a'^c^) and ^Nj^+r/r{ao - (To) ^ A^(0, (In 2)^ a^a^"^ c^) . (4.30) 
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This result follows from Proposition 14.21 bv an application of the Delta- method. 

Most heavy-tailed distributions used in applications satisfy Condition 13.11 but some do not 
satisfy Condition 13.21 Indeed, (|3.4|) implies that E|X|Pl|jS(^<]^j < cxo, for all p G M, which is 
rather stringent. Nevertheless, the results of Proposition 14.21 and Corollary 14.11 continue to 
hold even if Condition 13.21 is not satisfied and even if the X(z)'s can take negative values. This 
is so, because block-maxima become strictly positive as the block-size grows. We make this 
more precise in Proposition 14.31 below. 

Now, for convenience, introduce a special value * and suppose that our statistics take values 
in the extended real line M* := M U {*}. If a statistic is not well-defined (because it involves 
log2 X for X < 0, for example), we assign to it the special value *. The set {*} C M* is considered 
as both closed and open in the topology of M* and the topology of M C M* is the same as that 
of the real line. Therefore, the statistics Yj in (|2.7|) and the estimators H and C in (|4.2(ij) . 
become proper random variables which can sometimes take the value * if some of the X(i)'s 
are negative. 

The following result shows that, asymptotically, the estimators H and C become real-valued 
with probability one, provided that ln(A^)/2''(^-' ^ 0, as — > oo. 

Proposition 4.3 Suppose that the c.d.f. F has the representation 1)3.11) and satisfies Condition 
15'. il where F(0) is not necessarily zero. Let also r = r{N) G N, H and C he as in (|4.26|) . If 

ln(iV)/2'^(^) — > 0, iV ^ CX3, then 

^{{H = *}) + P({C = *}) — . 0, as iV ^ oo. (4.31) 

// in addition rT' /N + 1/2'"^/" ^ 0, as ^ oo, then the convergences ()4.28|) and 1)4. 3U() 
continue to hold. 

Proof: Let X{i), f G N be i.i.d. with c.d.f. F and let xq > be arbitrary. Define the 
truncated variables X{i) := X{i)l^x{i)>xo} + xo^{x{i)<xo}i ^ G N and observe that they are 
i.i.d. with c.d.f. F{x) := F{x), x > xq and F{x) = 0, x < xq. Thus, F{x) has a representation 
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as in H3.1|l with the function (t"(x) replaced by 

= ool(„^^^g)(a;) + o-"(j;)l[3.o_oo)(2;), 

where ct"(x) is the function involved in the corresponding representation of F{x). 

Consider the statistics D{j,k) and Yj defined as in (|2.5|) and l|2.7j) with X(z)'s replaced 
by X(i)'s. Let also H and C be the corresponding statistics defined as in (|4.2H|) with Yfs 
replaced by Yj's. Observe that F satisfies Condition 13. II and also trivially Condition 13.21 since 
xq > and a'^{x) = oo for all x £ {0,xq). Therefore, the results of Proposition 14. 21 applv to the 
statistics H and C. We will now show that the statistics H and C, which may not be always 
real-valued random variables (i.e. can take the special value *) coincide with the statistics H 
and C, eventually. 

Let 1 < jo < log2 N, j G N. Observe that the event 

Cjo ■■= {DUo, k) = Dijo, k), k = l,..., Nj,} 

implies the events Cj = {D{j,k) = D{j,k), k = 1, . . . ,Nj}, for all jo < j < log2 and 
in particular the events {Yj = Yj}, j > jo. Thus, the statistics H and H (and C and C, 
respectively) coincide on the event Cj^+r- Thus, to complete the proof of the proposition, it is 
sufficient to show that P(Cj^+r.) — > 1, as — > oo. 
Let jo := ji + r and observe that by independence, 

P(C,J = P{5(jo, 1) = D{jo, l)}^^o = (i - Fixof°Y'\ 

In view of Condition 13. 11 po := F{xq) < 1 and hence 

lnP(C,J = N,,ln{^-pl') = -^p2J«(l + o(l)), as jo - oo. 

Since po < 1, the first convergence in ()4.31|) implies that NpQ — > 0, as iV — > oo, and 
hence ^iCj^j^r(N)) ^ 1, as iV ^ oo. We have thus shown that holds. Relation (|On)) 

follows from 1)4. 28(1 by using the Delta-method. □ 

Remarks: 
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1. Observe that in view of (|2.13|) . i^Si = -f^^Si and Cy,^ = Cj^Sn for any (/> > 0. That is, 
one can compute, in practice, the generahzed least squares estimators H and C without 
having to use a plug-in estimator for a in ()4.2|1 (see also the Remarks in Section [2. 2|) . 

2. The constants Cyj appearing in Proposition 14.21 and Corollarv 14. II are given in Table [721 
below. We now comment on the optimal rate in these asymptotic results. 
Proposition l4.1l indicates that the bias of the estimator H in 1)4. 28(1 is of order 0(1/2*"^/"). 
On the other hand, the standard error of H is of order 0(2"^ /N). By balancing these 
orders, we obtain that 

yields the optimal order oi the mean squared error (m.s.e.) K{H — H)'^ , and a correspond- 
ing rate of convergence 

to the limit distribution of H in (|4.28|) . 

Hall (1982) (see Theorem 2 therein) obtained the same optimal order of convergence for 
the Hill-type estimators under the following semi-parametric assumptions on the tail of 
F: 

1 - F{x) = cix~"(l + C2X~^ + o(x~^)), as X ^ oo, a, /? > 0. (4.32) 

A Taylor expansion shows that this tail behavior corresponds to Condition 13. II above in 
the case when < (3 < a. Note that in Hall (1982) the parameter r corresponds to N jlF 
in our case. 

Observe that Theorems 1 and 2 in Hall (1982) involve also asymptotic normality re- 
sults for the scale parameter ci in (|4.32|) . These results are similar to those about C in 
Proposition 14.21 Note in particular the presence of the logarithmic in N factor r = r(N). 

3. The optimal rate in the previous remark may not be improved, in general. Indeed, by 
Proposition 13 . 1 1 the rate of the bias is exact if a°'{x) — ctq ~ cix~^, x — > oo, ci 0. This 
is typically the case in practice (see the Examples above). Relation ()4.2() also implies that 
the order of the variance of H is precisely ©(l/y'iVr), and cannot be improved. 
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Furthermore, the rate in the Berry-Esseen bound may not be improved, in general (see 
e.g. Ch. V.2 in Petrov (1995)). Thus, the result of Theorem 14. II is optimal in our setting. 

4. Consider the case of optimal m.s.e. of H, that is, 2"^ oc 

^a/{2p+a)^ Observe that the r.h.s. 
in ()4.9() is up to the logarithmic in N factor of r{N) of the same order as the root-m.s.e. 
(E(^ - HfY/'^. This indicates that the precision (in terms of coverage probability) of 
the confidence intervals for H based on the asymptotic distribution for H will be of order 
at least C'(l/iV/3'/(2c*+/3)) for any /?' G (0,/3). 

5. Even though the estimators S and ctq in Cor ollar v 14 . 1 1 are asymptotically normal, it is not 
a good idea to use their asymptotic distributions to construct confidence intervals for a 
and (jQ. Indeed, for simplicity consider the ideal Frechet case. In this case, the estimator 
H is unbiased and hence the estimator a = ^jH is biased. Moreover, since the variance 
of the random variable 1/X, where X has Normal distribution is infinite, we expect that 
Var(S) does not converge to the asymptotic variance of a in H4.28() . In our experience, the 
distribution of a tends to be skewed in practice. Therefore, one can get better confidence 
interval estimates for a by using inversion from the corresponding confidence intervals for 
H. For example, {{H + z^H^fo^l ^ Nj^ +r ) ~ , (-ff — ZpHy/c^ / ^ Nj^ +r ) ~ ) is an asymptot- 
ically correct 100(1 —p)% confidence interval for a, where Zp := <I>^^(1 — p/2), p G (0, 1). 
As indicated in the previous remark the error in the coverage probability of this interval 
is of order C'(l/A^'^'/(^""'~^)) for any [3' G (0,/?), if m.s.e. -optimal r's are chosen. 

5. Performance evaluation and data analysis 

5.1. Typical models: small and large sample properties 

We study the performance of the max self-similarity estimators when the data are heavy-tailed 
but deviate from the ideal Frechet case. Specifically, given a sample of size N = 2^, n G N, 
the GLS estimators H = H{ji,j2) and a = a{ji,j2) = l/H are computed for a range of scales 
ji < j < j2- We choose here j2 = n as the maximal available scale and focus on optimal jis 
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in the sense of mean squared error. Namely, we let 

:= Argmin E(^(ji, js) - H)\ 
ji, i<ji<i2 
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(5.1) 



where the last expectation is computed from samples of independent realizations of the esti- 
mators H. 
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Boxplots: max self-similarity and Hill for Pareto data, a = 5 
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Fig 3. Boxplots ofl, 000 independent realizations of max self-similarity and Hill estimators for different 
sample sizes from Pareto distributions with a ~ 5 (top panel) and a = 0.1 (bottom panel) are shown. The 
labels nM and nH correspond to sample size 2" of max self-similarity and Hill estimators, respectively. 
The Hill estimators were computed by using (|1.2|l with k = 2" — 1 , and the max self-similarity estimators 
are based on a range of scales ji < j < j2 = n, where ji was chosen to minimize the mean squared 



We first compare the max self-similarity estimators to the classical Hill estimator over Pareto 
data with unit scale, i.e. with c.d.f. F{x) = 1 — x~", a; > 1. In this case, the Hill estimator 
corresponds to the maximum likelihood estimator. Figure indicates that, as expected, the 
Hill estimator outperforms the max self-similarity estimator. However, as seen from the box- 
plots, the max self-similarity estimator works relatively well for small, moderate and large 
samples and essentially keeps up with the Hill estimators. In fact, as the sample size grows the 
max self-similarity estimator improves almost at the same rate as the Hill estimator. Here the 
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max self-similarity estimator was computed by using the range of scales j^^* < i < J2j where 
j2 = log2 N and j'^^* is as in (|5.1j) . 



Boxplots: max self-similarity and Hill for Frechet data, a = 5 
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Boxplots: max self-similarity and Hill for Frechet data, a = 0.1 
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Fig 4. Boxplots of 1,000 independent realizations of max self -similarity and Hill estimators for dif- 
ferent sample sizes from Frechet distributions with a = 5 (top panel) and a = 0.1 (bottom panel) are 
shown. The labels nM and nH correspond to sample size 2" of max self-similarity and Hill estimators, 
respectively. The Hill estimator were computed by using an optimal value for k in (|1.2() . which yields the 
smallest mean squared error. The max self- similarity estimators were computed from the entire range 
of scales j . 

In Figure 13 we compare the performance of the max self-similarity and the Hill estimators 
for Frechet data. The parameter k in (|1.2p of the Hill estimator was chosen to minimize the 
mean squared error of the statistics 1/3// (A;), by analogy with H5.1() . Now, the entire range of 
scales ii = 1 < j2 = log2 N was used to compute the max self-similarity estimators. Observe 
that as compared to the case of Pareto data (see Figure 01), now the roles of the two estimators 
are reversed. As expected, the max self-similarity estimator works best in the Frechet setting 
and dominates the Hill estimator. In fact, the method of choosing the parameter k here is 
unusually favorable to the Hill estimator since it is not based on examining and determining 
a range where the Hill plot is constant. It is well known that in practice, the Hill plot is quite 
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volatile and the resulting choice of k based on this plot would yield far more biased estimators 
than the ones shown in Figure |3 

We now examine the max self-similarity estimators in more detail when the data are drawn 
from a stable and a t— distribution. Tables ?? and ?? below, indicate that the estimators 
Hopt '■= H{ji^^,j2) work well in practice for a variety of sample sizes and parameter values. 
Their performance is particularly good in the stable context. The performance in the case of 
t— distributions is comparable with the stable cases when the heavy-tail exponent a is not 
large. Notice that a corresponds to the degrees of freedom of the t— distribution and therefore 
as a grows, the t— distribution gets closer to the Normal distribution. Although it it still heavy 
tailed, most of the body of the distribution is not and therefore the quality of the tail estimators 
deteriorates. 

Table ?? indicates that the max self-similarity estimator outperforms the Hill estimator for 
stable distributions with a < 1 and that the two estimators are comparable for 1 < a < 2. 
The Hill estimator is slightly better than or comparable to the max self-similarity one for the 
t-distributions with low a's and slightly worse or comparable for moderate and large a's (Table 
??). 

The MSE-optimal choice of the parameter k is unrealistically favorable to the Hill estimator. 
In practice, these choices of k typically do not correspond to constant regions in the Hill plot. 
On the other hand the MSE-optimal values of ji usually correspond to the knee in the max- 
spectrum plot, which can be identified in practice (either visually or automatically). These 
observations suggest that in reality the max self-similarity estimators are more reliable and 
accurate than estimators based on the Hill plot. 

5.2. On the selection of the scales j\ and j2 

In the ideal case of a-Frechet data, the max-spectrum plot of Yj is almost perfectly linear 
in j (see Figure |21). However, most real data sets deviate from the ideal case and thus the 
max-spectrum becomes linear only over a range of relatively large scales j. The selection of an 
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Max self-similarity H= 1 .031 (0.073357), a =0.96994 
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Fig 5. Mixtures of a—Frechet (10%) and Exponential of mean 5 (90%) were simulated. The heavy- 
tail exponent is a = 1 and the sample sizes are N = 2^^ = 131,072. Left panel: max-spectrum of a 
typical sample. Right panel: 1,000 independent replications of the GLS max self-similarity estimators 
were obtained, where automatic selection for the parameter ji was used with p = 0.01 and 6 = 4. The 
top-right graph shows a histogram of the resulting selections of ji . The bottom-right graph shows the 
root-mean squared error of the estimators H ~ I /a. The top-left and top-right plots shows histograms 
of the a estimates obtained by using automatically selected ji 's and with ji = = 10, respectively. 



appropriate range of scales ji < j < j2, where the max self-similarity estimators are computed, 
becomes an important practical problem. Because of (|2.8() . one can always choose j2 = [log2 N] 
to be the largest available scale and the scale ji can be chosen by visual inspection, a strategy 
that work fairly well in practice. Nevertheless, we also propose an automatic procedure for 
choosing the scale ji, which turns out to also work well in practice. It relies on the following 
simplifying assumptions: 

Assumption 1. The vector Yj, j = 1, . . . , j2 follows a multivariate Normal distribution. 

Assumption 2. The covariance matrix Sq,(1,J2;A^) = a~^T,i{l, j2; N) of the vector Y = 
{YjVjLi is given by (jHH). 

These assumptions are valid asymptotically, provided that Nj2 oo (Theorem 14.11 and 
Proposition 14. Since the Nj^s grow exponentially fast as j decreases, choosing j2 as the 
largest available scale [log2 A^] is not critical in practice. Let now H{ji,j2) denote the GLS 
estimate oi H = 1/a, computed over the range of scales ji < j < j2 as in 1)2. (see also 

Km ). 
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Tunning parameters: 

Pick a relatively small significance threshold p G (0, 1) (e.g. p = 0.1 or 0.01) and an 
integer b called back-start parameter (e.g. 5 = 3 or 4 for moderate sample sizes). Set 
j2 := [log2 N] and ji := max{l, j2 - b}. 
Step 1. If ji = 1 then stop, else calculate H^cw = H{ji — 1, j2) and i^oid = H{ji,j2). 

Step 2. Let Wnew and Wo\a be vectors of weights as in (|4.26j) . such that -ffncw = («^ncw, 5^) and 
-ffold = ("f^old;^)) where Y = {5^}jLi G '^'''^ and where the vectors w^cw, w^id G IK-'^ 
are appropriately padded with zeros. Consider the quantity: 

/ \ 1/2 

5*1 := \ (Wnc:w - Wo\d),^l{'^, j2; N){Wncw " t^old) ) 

Now, consider the approximate (1 —^j)— level confidence interval for ^{Hnew — Hold)- 

{Hnew — Hold — Zp/2HoldSl, Hnew — Hold + ^p/2-f^old'S'l^ , 

where Zp/2 = — p/2) is a (1 — p/2)— th quantile of the standard Normal distribu- 

tion. 

Step 3. If zero is contained in the confidence interval computed in Step 2, then set ji := ji — 1 
and go to Step 1 otherwise stop and report the selected ji and a := l/i/oid- 

The choice of tunning parameters p and b and the validity of the above simplifying assump- 
tions is addressed in Stoev et al. (2006). In Figure [51 we briefly demonstrate the performance 
of the above automatic selection procedure for a mixture of an Exponential and an a— Frechet 
distributions. Samples of size N = 2^"^ = 131,072 were generated and a level p = 0.01 and 
back-start parameter b = 4 employed. The left panel indicates the presence of a "knee" in the 
max-spectrum plot in one such mixture sample. The automatic selection procedure identified 
well the location of the knee by selecting ji = 9 and the resulting estimate a = 0.97 is rather 
close to the nominal value of a = 1. In the right panel, we demonstrate the performance of the 
automatic selection procedure by using 1, 000 independent replications of the mixture samples. 
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The histogram of the automatic choices for ji (left panel) indicates that most of the times 
values close to the MSE-optimal one j"^^ = 10 were chosen. The histogram of the resulting 
estimates of the heavy-tail exponent (top-right graph in the left panel) is similar to the his- 
togram corresponding to the MSE-optimal choice of ji (bottom-right in the left plot). The 
slight bias in the histogram on the top-right is due to the fact that often slightly lower than the 
MSE-optimal values of ji were chosen by the automatic procedure. More extensive analysis of 
this procedure is presented in Stoev et al. (2006). 

5.3. Data analysis 

We first discuss a popular insurance data set of 2, 167 fire losses in Denmark from 1980 to 
1990. This data set has been studied extensively, see e.g. McNeil (1997), Resnick (1997a), Lu 
and Peng (2002) and Peng and Qi (2004). 

Figure ini displays the data, its corresponding Hill plot (bottom left) and its max-spectrum 
(bottom right). The max-spectrum yields an estimate S = 1.66 obtained with an automatic 
selection of the scale ji by using a tunning parameter p = 0.01 (see Section 15.2(1 . and the 
Hill plot yields an estimate anik) = 1.39 for k = 1,000. This discrepancy between the two 
methods is interesting since they yield comparable results in many typical models (see Section 
15.11 above). To explore further the significance of this difference, we resort to calculating 
confidence intervals. 

A particular advantage of the max-spectrum type estimators is that one can naturally obtain 
the following two types of confidence intervals for the parameters H and a = 1/H: (i) based 
on the asymptotic normal distribution (see Proposition 14. 2|) and (ii) based on a permutation 
bootstrapping procedure. We will only briefiy describe the procedure for obtaining permutation 
bootstrap confidence intervals. Its theoretical analysis is outside the scope of the present paper. 

Permutation bootstrap confidence intervals 

Given an i.i.d. sample X(l), . . . , X(A^), generate M independent random permutations 
TTj : {1,...,A^} {1,...,A^}, i = 1,...,M. Then, construct the permuted samples 
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Danish Fire Loss Data: 1980 - 1990 



250 






200 






150 










100 










50 






























iiloL 






itl. J.il.,J...lJ iiilLlillJiJ 


ilJilikiuu 


L,iJl..,,..NLk,lLl, 


,1 



200 400 600 800 1000 1200 1400 1600 1800 2000 
Hill plot: a^(l^) = 1 .394 H= 0.60422 (0.020897), a =1 .655 

H 




500 1000 1500 2000 5 10 

order statistics Scales j 



Fig 6. Top panel: time series of insurance losses due to fire in Denmark from 1980 to 1990 losses 
(in million Danish krones). Bottom left panel: the Hill plot of the fire loss data set. Bottom right: the 
max-spectrum of the data. Note that the Hill estimate is a^r(fc) = 1.39, with k = 1,000 and the max 
self-similarity estimate is a — 1.66. 

Xi{\), . . . ,Xi{N), i = 1, . . . ,M, where Xi{k) = X{'Ki{k)), k = 1, . . . , N . Fix a range of scales 
Ji < j2 ^ log2 and for each i = 1,...,M, compute the GLS max self-similarity estima- 
tor Hi = Hi{ji,j2), from the permuted sample Xi{l), . . . ,Xi{N). We will refer to the sample 
Hi, i = 1, . . . , M as to the permutation bootstrap sample of the estimator H = H{ji,j2), based 
on the original data set X(l), . . . , X{N). 

Observe that the statistics Hi, i = 1, . . . ,M are mutually dependent, since they are based on 
the original sample ^(1), ■ ■ ■ ,X{N). However, since the X(/c)'s are i.i.d. and the permutations 
TTj's are independent, we have that Hi H{ji,j2), for alH = 1, . . . , M. One has moreover that 
the sequence Hi, i = 1, . . . , M is exchangeable. This suggests using the permutation bootstrap 
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Danish Fire Loss: 95% confidence intervals for H=1/a Danish Fire Loss: 95% confidence intervals fora 




Fig 7. Left panel: 95% confidence intervals for H = 1/a based on: (i) permutation-bootstrap from 10 , 000 
independent permutations and (ii) asymptotic distribution for the max self-similarity estimators. Right 
panel: 95% confidence intervals for a = 1/H obtained by inverting the confidence intervals in the left 
panel. The horizontal lines indicate the estimated value of H = 0.6 and a = 1/H = 1.66 for H and a, 
respectively, obtained with the max self-similarity estimator in Figure\^ 

sample Hi,...,Hm as a proxy to the sampling distribution of H. We thus propose to use 
the empirical confidence interval based on the permutation bootstrap sample as a confidence 
interval for H. Corresponding bootstrap confidence intervals for a = 1/H are obtained through 
the inversion method. 

Experience with several simulation experiments suggests the following conjecture. 

Conjecture 5.1 Let Hi, i = 1,...,M he a permutation bootstrap sample of the estimator 
H{ji,j2)- Consider the scales ji, j2 and the permutation sample size M as functions of the 
sample size N, which tend to infinity as N ^ oo. 

Under certain conditions on the rates of growth o/j'i, j2 and M , the empirical distribution of 
the permutation bootstrap sample Hi, i = 1, . . . , M yields asymptotically consistent confidence 
intervals for H . 

Figure [7| displays 95% confidence intervals for H (left panel) and a = 1 /H (right panel) 
for the Danish fire loss data. Different scales ji were used and j2 was chosen as the largest 
available scale 11. The permutation confidence intervals (denoted by dots) are obtained from 
M = 10, 000 random permutations and the asymptotic confidence intervals (denoted by circles) 
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are obtained from the asymptotic variance in Proposition l4.2l where the unknown value of H was 
replaced by H . To be able to compare the two types of intervals, we centered the asymptotic 
confidence intervals at the means of the permutation bootstrap samples Hi, i = 1, . . . ,M. 
Observe that although the two procedures for constructing confidence intervals are different, 
they yield very similar results. The permutation bootstrap intervals are always slightly more 
narrow than the asymptotic ones. As Figure IHl indicates, the use of scales ji = 1 and j2 = 
11 is acceptable. The resulting permutation and asymptotic confidence intervals for H are: 
[0.5880,0.6361] and [0.5710,0.6540], respectively. They are consistent with, but considerably 
tighter than the likelihood-based intervals in Figure 8 of Lu and Peng (2002) for the same data 
set. This can be contributed to the fact that the max-spectrum estimators and the Hill-type 
estimators are based on different principles. The performance of the permutation bootstrap 
and asymptotic confidence intervals is addressed in more detail in Stoev et al. (2006). 

The second data set to be analyzed in this section consists of the volumes in trillion cubic 
feet of the 406 largest natural gas world provinces. The data were obtained from Table 1 in 
(n.d.). The study of the patterns in such data will help in the development of future natural 
gas resources leading to better assessments of the reserve growth potential of the world's 
provinces. The max self-similarity estimator, obtained from a typical randomly permuted 
sample is a = 1.284 (Figure EJ. Observe that the Hill plot shown in the bottom-left panel of 
Figure ISl is very volatile and appears to stabilize in a narrow range around k = 60, where the 
resulting estimator is 3// (60) = 0.826. Notice that the integer nature of the observations makes 
the Hill plot exhibit a saw-tooth like pattern and hence difficult to obtain a good estimate for 
a. Due to the discrepancy between the two methods, obtaining confidence intervals becomes 
particularly pertinent. 

Permutation bootstrap and asymptotic confidence intervals for the max self-similarity esti- 
mators for H = 1/a and a are presented in Figure IHl As in Figure 1^1 the asymptotic confidence 
intervals are slightly wider than the ones based on the permutation bootstrap. Observe that, 
contrary to the case of fire loss data in Figure [3 the locations of the confidence intervals for 
the gas data set stabilize only at scales j > 4. This indicates that the value a = 1.284, obtained 
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Natural Gas Reserves in 406 Oil Fields 
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Fig 8. Top panel: randomly permuted sample of volumes natural gas reserves (in trillion cubic feet) found 
in 406 provinces. Bottom left panel: the Hill plot of the data set. Bottom right panel: the max-spectrum 
of the data. Note that the Hill estimate is aH{k) = 0.826, with fc = 60 and the max self- similarity 
estimate is a ^ 1.284. 



from the range of scales ji = 4 and j2 in Figure |H1 is credible. The fact that the resulting Hill 
estimate 3// (60) = 0.826 is less than 1 appears to be not statistically significant, according to 
the confidence intervals in Figure |H1 which is in line with the findings in de Sousa and Michai- 
lidis (2004). This last fact and the volatility of the Hill plot suggest that the max self-similarity 
estimators can be viewed as more reliable in this setting. 
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Natural Gas Reserves: 95% confidence intervals for H^1/a Natural Gas Reserves: 95% confidence intervals fora 
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Fig 9. Left panel: 95% confidence intervals for H = 1/a based on: (i) permutation-bootstrap from 10 , 000 
independent permutations and (ii) asymptotic distribution for the max self- similarity estimators. Right 
panel: 95% confidence intervals for a = 1/H obtained by inverting the confidence intervals in the left 
panel. The horizontal lines indicate the estimated value of H = 0.78 and a ~ 1/H = 1.28 for H and a, 
respectively, obtained with the max self- similarity estimator in Figure\^ 

6. Concluding remarks 

In this paper, a new estimator for the tail exponent of a distribution was introduced and its 
asymptotic properties estabhshed. The estimator is based on block-maxima of the data and 
can be visualized through a new graphical device called the max~spectrum plot. Numerical 
work shows that compared to the widely used Hill estimator, the max self-similarity estimator 
performs competitively in the case of the Pareto distribution and it outperforms the Hill 
estimators in the cases of the stable, Frechet and certain t-distributions. In practice, the max- 
spectrum plot is less volatile than the classical Hill plot. Thus, the max self-similarity estimator 
can be used in situations where the Hill plot fails or when it is hard to interpret. Finally, the 
fact that the estimator is based on block maxima makes it particularly suitable for time series 
data, a topic discussed in a companion paper Stoev et al. (2006). 
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(MT). 

7. Appendix: auxiliary results and tables 
7.1. Auxiliary results 

We briefly review some properties of the a— Frechet distributions used above. 
Definition 7.1 A random variable Z is said to have an a— Frechet distribution, if 

{expj— (t"x~"| , a; > 
(7.1) 
, X < 0, 

with (7 > 0. The parameter a is referred to as the scale coefficient of Z. The random variable 
Z is said to be standard a— Frechet if o" = 1. 

Let Z be an a— Frechet variable with scale coefficient o" > 0. The next properties follow 
directly from Relation (|7.1() . 

Properties 

1. (scale family) For all c > 0, the random variable cZ is a— Frechet and has scale coefficient 
ca. 

2. (heavy tails) The Taylor expansion of the exponential around the origin implies that 

F{Z > x} = 1 - e"'^"^"" ~ cT^x"", as X ^ oo. (7.2) 

3. (moments) In view of H7.2|) . for all p > 0, 

EZ*' < oo if and only if p < a. 

One has moreover, that EZ^ = aPT{l—p/ a), p G (0, a), with r(x) = u^~^e~^du, x > 
0. 

4. (log-moments) For all p > 0, the moments E| lnZ\P are finite. This follows from the fact 
that ^ := aln{Z/a) has the Gumbel distribution, i.e. < x} = exp{— e~^}, x E M. 
See also Corollarv 13. II below. 
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5. (power transformations) For any p > 0, the random variable Z'p is a/p— Frechet with 
scale coefficient . Consequently, if Zi is a standard 1— Frechet variable, then 

Z := Zl''' 

is standard a— Frechet, for all a > 0. 
The a— Frechet distributions are also max-stable in the following sense. 

Definition 7.2 A random variable Z is said to be max-stable, if for all a,b > there exist 
c > 0, d G M, such that 

maxjaZ', bZ"} = cZ + d, 
where Z' and Z" are independent copies of Z and where means equality in distribution. 
In particular, by (|7.1() . one gets that if Z{1), . . . , Z{n), n G N are i.i.d. a— Frechet, then 



This last relation shows that a sequence of i.i.d. a— Frechet variables is also max self-similar 
with parameter H = 1/a (see Definition 12.11 above) . Relation (|7.3() served as the main motiva- 
tion to define the max self-similarity estimators in Section |21 above. 

The class of max-stable distributions in the sense of Definition E2] above includes, in addition 
to the Frechet, only the classes of negative Frechet and the Gumbel laws. These three classes of 
distributions are the only distributions arising in the limit of maxima of i.i.d. variables under 
appropriate normalization (see e.g. Proposition 0.3 in Resnick (1987) and also Leadbetter, 
Lindgren and Rootzen (1983)). 

The following integration by parts formula is used in the proof of Theorem 13. 11 

Lemma 7.1 Let f : [a,b] ^ M., a, 6 G M be an absolutely continuous function, that is, f{x) = 
f{a) + f'{u)du, for some Lebesgue integrable /'(x), x £ [a, b]. Then, for any c.d.f. G{x), we 
have 



Z{1) V ••• VZ(n) 



d 

= n 



i/"Z(l). 



(7.3) 




(7.4) 
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Proof: Since f{x) = f{a) + fl^ f {u)l[a,x)iu)du, we have that 

f f{x)dG{x) = f{a)G{b) - f{a)G{a) + f ( [' f'{u)^,^,){u)du)dG{x). 

J a J a J a 

An apphcation of Pubini's theorem yields 

/(a)G(6) - /(a)G(a) + t f'{u){G{b) - G{u))du 



a 



f{a)G{b) - f{a)G{a) + - f{a))G{b) - [' f'{u)G{u)du. 



a 



Observe that the right-hand sides of the last expression and Relation (|7.4|) coincide. □ 



7.2. Tables 
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i + 


i + 1 


1 + 2 


i + 3 


i + 4 


i = 


3.423696 


2.211864 


1.387207 


0.846734 


0.504666 


i = 5 


0.294581 


0.168963 


0.095563 


0.053288 


0.029470 


i = 10 


0.016072 


0.008755 


0.004756 


0.002552 


0.001405 


i = 15 


0.000709 


0.000335 


0.000175 


0.000097 


0.000032 



Table 7.1 

We present here numerical approximations of the values i = 0, 1, . . . , 19 involved in the 

expression of the covariance matrices Ec(ji,j2;iV) in (|2.14|l (see also (|2.15|) ). We used Monte Carlo 
simulations with 10,000,000 independent pairs ofl—Frechet variables. To reduce the variance of the 
estimates we used "bagging". That is, the Monte Carlo simulations were repeated independently 1,000 
times and then the resulting means were taken as the final estimates reported in the table above. 



j 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 




1.417 


0.802 


0.515 


0.346 


0.238 


0.166 


0.116 


0.082 


0.058 


0.041 


j+ = 10 


0.029 


0.020 


0.014 


0.010 


0.007 


0.005 


0.004 


0.003 


0.002 


0.001 


\/2^c»(i) 


2.834 


2.267 


2.060 


1.960 


1.905 


1.875 


1.857 


1.847 


1.841 


1.837 


j+ = 10 


1.835 


1.834 


1.834 


1.833 


1.833 


1.833 


1.833 


1.833 


1.833 


1.833 



Table 7.2 

We present here numerical estimates of the constants c^ involved in the asymptotic variances in 
Proposition \4.<!\ above. Here, we use ji = 1, for simplicity, and display 20 different values 
corresponding to j2 = j = 2, . . . , 21. For convenience, we present y/c^ together with \p2P^c^ where the 
latter constant is useful if one normalizes in ()4.28|l by using ^/N7 instead of ^JnJ^. 
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